Efficient Memory Representation of XML Documents

نویسندگان

  • Giorgio Busatto
  • Markus Lohrey
  • Sebastian Maneth
چکیده

Implementations that load XML documents and give access to them via, e.g., the DOM, suffer from huge memory demands: the space needed to load an XML document is usually many times larger than the size of the document. A considerable amount of memory is needed to store the tree structure of the XML document. Here a technique is presented that allows to represent the tree structure of an XML document in an efficient way. The representation exploits the high regularity in XML documents by “compressing” their tree structure; the latter means to detect and remove repetitions of tree patterns. The functionality of basic tree operations, like traversal along edges, is preserved in the compressed representation. This allows to directly execute queries (and in particular, bulk operations) without prior decompression. For certain tasks like validation against an XML type or checking equality of documents, the representation allows for provably more efficient algorithms than those running on conventional representations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient memory representation of XML document trees

Implementations that load XML documents and give access to them via, e.g., the DOM, suffer from huge memory demands: the space needed to load an XML document is usually many times larger than the size of the document. A considerable amount of memory is needed to store the tree structure of the XML document. In this paper, a technique is presented that allows to represent the tree structure of a...

متن کامل

Frozen streams: an experimental time- and space-efficient implementation for in-memory representation of XML documents using Java

As XML becomes a pervasive technology for data storage and processing, many adopters of the technology face a practical problem caused by the perceived slow performance of many XML processing operations, particularly in comparison to tried and trusted RDBMSbased solutions that are being replaced. Earlier this year, a lengthy thread on the xml-dev mailing list on XML Performance1 agonised over X...

متن کامل

Expeditious XML Processing

The efficiency of an XML processor is highly dependent on the representation of the XML document in the computer's memory. We present a representation for XML documents, derived from Warren's representation of Prolog terms in WAM, which permits very efficient access and update operations. Our scheme is implemented in CXMLParser, a non-validating XML processor. We present the results of a perfor...

متن کامل

Xml Mining: from Trees to Strings

XML is becoming this few years the standard of data exchange in the Web and a new data description language. Consequently, in a Data Mining context, optimizing storage and access time to XML documents is becoming a new challenge. Indeed, for mining XML documents we have to parse them in order to obtain a tree data structure in RAM memory. This tree structure is more flexible and have a beter ti...

متن کامل

Building and Searching an XML-Based Corporate Memory

NO MATTER WHO USES A CORporate memory or how it is constructed, information search through that memory should be efficient and effective. In particular, it should adapt to the users’needs, activities, and work environments. For a document-based corporate memory distributed through the Web, which is our research area, these requirements raise two main questions: How will we describe the document...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005